Demonstration of simple problematic distributions and how to interpret the diagnostics.
library("rprojroot")
root<-has_file(".Workflow-Examples-root")$make_fix_file()
library(cmdstanr)
library(posterior)
options(pillar.neg = FALSE, pillar.subtle=FALSE, pillar.sigfig=2)
library(lemon)
library(tidyr)
library(dplyr)
library(ggplot2)
library(bayesplot)
theme_set(bayesplot::theme_default(base_family = "sans", base_size=16))
set1 <- RColorBrewer::brewer.pal(7, "Set1")
SEED <- 48927 # set random seed for reproducibility
Unbounded likelihood without proper prior leading to improper posterior
Univariate continous x, binary y, and the two classes are completely separable, which leads to unbounded likelihood.
set.seed(SEED+4)
M=1;
N=10;
x=matrix(sort(rnorm(N)),ncol=M)
y=rep(c(0,1), each=N/2)
data_logit <-list(M = M, N = N, x = x, y = y)
ggplot() +
geom_point(aes(x, y), data = data.frame(data_logit), size = 3)+
scale_y_continuous(breaks=c(0,1))
A simple Bernoulli regression (where we have forgotten to include priors)
code_logit <- root("problems", "logit_glm.stan")
writeLines(readLines(code_logit))
// logistic regression
data {
int<lower=0> N;
int<lower=0> M;
int<lower=0,upper=1> y[N];
matrix[N,M] x;
}
parameters {
real alpha;
vector[M] beta;
}
model {
y ~ bernoulli_logit_glm(x, alpha, beta);
}
Sample
mod_logit <- cmdstan_model(stan_file = code_logit)
fit_logit <- mod_logit$sample(data = data_logit, seed = SEED, refresh = 0)
Running MCMC with 4 sequential chains...
Chain 1 finished in 1.0 seconds.
Chain 2 finished in 1.1 seconds.
Chain 3 finished in 1.1 seconds.
Chain 4 finished in 1.2 seconds.
All 4 chains finished successfully.
Mean chain execution time: 1.1 seconds.
Total execution time: 4.8 seconds.
There are convergence issues reported by sampling. We can also explicitly call CmdStan inference diagnostics:
fit_logit$cmdstan_diagnose()
Processing csv files: /tmp/RtmpoIWO9X/logit_glm-202110211821-1-56d486.csv, /tmp/RtmpoIWO9X/logit_glm-202110211821-2-56d486.csv, /tmp/RtmpoIWO9X/logit_glm-202110211821-3-56d486.csv, /tmp/RtmpoIWO9X/logit_glm-202110211821-4-56d486.csv
Checking sampler transitions treedepth.
2340 of 4000 (58%) transitions hit the maximum treedepth limit of 10, or 2^10 leapfrog steps.
Trajectories that are prematurely terminated due to this limit will result in slow exploration.
For optimal performance, increase this limit.
Checking sampler transitions for divergences.
1660 of 4000 (42%) transitions ended with a divergence.
These divergent transitions indicate that HMC is not fully able to explore the posterior distribution.
Try increasing adapt delta closer to 1.
If this doesn't remove all divergences, try to reparameterize the model.
Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory for all transitions.
The following parameters had fewer than 0.001 effective draws per transition:
alpha, beta[1]
Such low values indicate that the effective sample size estimators may be biased high and actual performance may be substantially lower than quoted.
The following parameters had split R-hat greater than 1.1:
alpha, beta[1]
Such high values indicate incomplete mixing and biasedestimation.
You should consider regularizating your model with additional prior information or a more effective parameterization.
Processing complete.
We check \(\widehat{R}\) end ESS values
draws <- as_draws_rvars(fit_logit$draws())
summarize_draws(draws)
| variable | mean | median | sd | mad | q5 | q95 | rhat | ess_bulk | ess_tail |
|---|---|---|---|---|---|---|---|---|---|
| lp__ | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | NA | NA | NA |
| alpha | 1.438586e+29 | 1.030470e+26 | 2.902607e+29 | 1.527775e+26 | 5.652304e+20 | 8.341800e+29 | 2.95 | 5 | NA |
| beta | 3.329284e+29 | 2.603185e+26 | 6.752337e+29 | 3.859482e+26 | 1.401509e+21 | 1.946232e+30 | 2.99 | 5 | NA |
Plot
mcmc_pairs(as_draws_array(draws), pars=c("alpha","beta"))
Quite clear case
A simple Bernoulli regression with proper prior
code_logit2 <- root("problems", "logit_glm2.stan")
writeLines(readLines(code_logit2))
// logistic regression
data {
int<lower=0> N;
int<lower=0> M;
int<lower=0,upper=1> y[N];
matrix[N,M] x;
}
parameters {
real alpha;
vector[M] beta;
}
model {
alpha ~ normal(0,10);
beta ~ normal(0,10);
y ~ bernoulli_logit_glm(x, alpha, beta);
}
Sample
mod_logit2 <- cmdstan_model(stan_file = code_logit2)
fit_logit2 <- mod_logit2$sample(data = data_logit, seed = SEED, refresh = 0)
Running MCMC with 4 sequential chains...
Chain 1 finished in 0.0 seconds.
Chain 2 finished in 0.0 seconds.
Chain 3 finished in 0.0 seconds.
Chain 4 finished in 0.0 seconds.
All 4 chains finished successfully.
Mean chain execution time: 0.0 seconds.
Total execution time: 0.5 seconds.
There were no convergence issues reported by sampling. We can also explicitly call CmdStan inference diagnostics:
fit_logit2$cmdstan_diagnose()
Processing csv files: /tmp/RtmpoIWO9X/logit_glm2-202110211821-1-184bdd.csv, /tmp/RtmpoIWO9X/logit_glm2-202110211821-2-184bdd.csv, /tmp/RtmpoIWO9X/logit_glm2-202110211821-3-184bdd.csv, /tmp/RtmpoIWO9X/logit_glm2-202110211821-4-184bdd.csv
Checking sampler transitions treedepth.
Treedepth satisfactory for all transitions.
Checking sampler transitions for divergences.
No divergent transitions found.
Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory for all transitions.
Effective sample size satisfactory.
Split R-hat values satisfactory all parameters.
Processing complete, no problems detected.
We check \(\widehat{R}\) end ESS values, which in this case all look good.
draws <- as_draws_rvars(fit_logit2$draws())
summarize_draws(draws)
| variable | mean | median | sd | mad | q5 | q95 | rhat | ess_bulk | ess_tail |
|---|---|---|---|---|---|---|---|---|---|
| lp__ | -2.64 | -2.32 | 1.07 | 0.78 | -4.86 | -1.60 | 1.01 | 831 | 1171 |
| alpha | 6.03 | 5.82 | 2.84 | 2.88 | 1.81 | 11.01 | 1.01 | 753 | 1018 |
| beta | 14.36 | 14.01 | 6.03 | 6.22 | 5.47 | 24.78 | 1.01 | 705 | 880 |
Plot
mcmc_pairs(as_draws_array(draws), pars=c("alpha","beta"))
No problems
A simple Bernoulli regression with proper prior (but we have forgotten to remove unused parameter declaration)
code_logit3 <- root("problems", "logit_glm3.stan")
writeLines(readLines(code_logit3))
// logistic regression
data {
int<lower=0> N;
int<lower=0> M;
int<lower=0,upper=1> y[N];
matrix[N,M] x;
}
parameters {
real alpha;
vector[M] beta;
real gamma;
}
model {
alpha ~ normal(0,1);
beta ~ normal(0,1);
y ~ bernoulli_logit_glm(x, alpha, beta);
}
Sample
mod_logit3 <- cmdstan_model(stan_file = code_logit3)
fit_logit3 <- mod_logit3$sample(data = data_logit, seed = SEED, refresh = 0)
Running MCMC with 4 sequential chains...
Chain 1 finished in 1.5 seconds.
Chain 2 finished in 1.5 seconds.
Chain 3 finished in 1.5 seconds.
Chain 4 finished in 1.5 seconds.
All 4 chains finished successfully.
Mean chain execution time: 1.5 seconds.
Total execution time: 6.2 seconds.
There are convergence issues reported by sampling. We can also explicitly call CmdStan inference diagnostics:
fit_logit3$cmdstan_diagnose()
Processing csv files: /tmp/RtmpoIWO9X/logit_glm3-202110211822-1-7d17f1.csv, /tmp/RtmpoIWO9X/logit_glm3-202110211822-2-7d17f1.csv, /tmp/RtmpoIWO9X/logit_glm3-202110211822-3-7d17f1.csv, /tmp/RtmpoIWO9X/logit_glm3-202110211822-4-7d17f1.csv
Checking sampler transitions treedepth.
1663 of 4000 (42%) transitions hit the maximum treedepth limit of 10, or 2^10 leapfrog steps.
Trajectories that are prematurely terminated due to this limit will result in slow exploration.
For optimal performance, increase this limit.
Checking sampler transitions for divergences.
No divergent transitions found.
Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory for all transitions.
The following parameters had fewer than 0.001 effective draws per transition:
gamma
Such low values indicate that the effective sample size estimators may be biased high and actual performance may be substantially lower than quoted.
The following parameters had split R-hat greater than 1.1:
gamma
Such high values indicate incomplete mixing and biasedestimation.
You should consider regularizating your model with additional prior information or a more effective parameterization.
Processing complete.
We check \(\widehat{R}\) end ESS values
draws <- as_draws_rvars(fit_logit3$draws())
summarize_draws(draws)
| variable | mean | median | sd | mad | q5 | q95 | rhat | ess_bulk | ess_tail |
|---|---|---|---|---|---|---|---|---|---|
| lp__ | -6.630000e+00 | -6.310000e+00 | 1.050000e+00 | 7.600000e-01 | -8.66000e+00 | -5.640000e+00 | 1.00 | 1769 | 2207 |
| alpha | 3.300000e-01 | 3.300000e-01 | 6.100000e-01 | 5.800000e-01 | -6.70000e-01 | 1.370000e+00 | 1.00 | 3339 | 2920 |
| beta | 1.250000e+00 | 1.220000e+00 | 7.700000e-01 | 7.600000e-01 | -1.00000e-02 | 2.570000e+00 | 1.00 | 3429 | 2586 |
| gamma | -5.814145e+19 | -7.507785e+18 | 1.291367e+20 | 7.812153e+19 | -3.31251e+20 | 8.772598e+19 | 3.38 | 4 | NA |
Plots
mcmc_pairs(as_draws_array(draws), pars=c("alpha","beta","gamma"))
A case where trace plot is actually useful
mcmc_trace(as_draws_array(draws), pars=c("gamma"))
We add another column to the previous data matrix. Sometimes data matrix is augmented with a column of 1’s, to present the intercept effect, but in this case that is redundant as our model has explicit intercept term alpha.
M=2;
N=1000;
x=matrix(c(rep(1,N),sort(rnorm(N))),ncol=M)
y=((x[,1]+rnorm(N)/2)>0)+0
data_logit4 <-list(M = M, N = N, x = x, y = y)
We use the previous Bernoulli regression model with proper priors.
code_logit2 <- root("problems", "logit_glm2.stan")
writeLines(readLines(code_logit2))
// logistic regression
data {
int<lower=0> N;
int<lower=0> M;
int<lower=0,upper=1> y[N];
matrix[N,M] x;
}
parameters {
real alpha;
vector[M] beta;
}
model {
alpha ~ normal(0,10);
beta ~ normal(0,10);
y ~ bernoulli_logit_glm(x, alpha, beta);
}
Sample
mod_logit4 <- cmdstan_model(stan_file = code_logit2)
fit_logit4 <- mod_logit4$sample(data = data_logit4, seed = SEED, refresh = 0)
Running MCMC with 4 sequential chains...
Chain 1 finished in 3.8 seconds.
Chain 2 finished in 4.3 seconds.
Chain 3 finished in 3.9 seconds.
Chain 4 finished in 3.8 seconds.
All 4 chains finished successfully.
Mean chain execution time: 4.0 seconds.
Total execution time: 16.0 seconds.
The computation time per chain with the original x with just one column was less than 0.1s per chain. Now the computation time per chain is several seconds, which is suspicious.
There were no convergence issues reported by sampling. We can also explicitly call CmdStan inference diagnostics:
fit_logit4$cmdstan_diagnose()
Processing csv files: /tmp/RtmpoIWO9X/logit_glm2-202110211822-1-406cb8.csv, /tmp/RtmpoIWO9X/logit_glm2-202110211822-2-406cb8.csv, /tmp/RtmpoIWO9X/logit_glm2-202110211822-3-406cb8.csv, /tmp/RtmpoIWO9X/logit_glm2-202110211822-4-406cb8.csv
Checking sampler transitions treedepth.
Treedepth satisfactory for all transitions.
Checking sampler transitions for divergences.
No divergent transitions found.
Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory for all transitions.
Effective sample size satisfactory.
Split R-hat values satisfactory all parameters.
Processing complete, no problems detected.
We check \(\widehat{R}\) end ESS values, which in this case are ok, but ESS’s are lower than what we would expect from Stan for such a lower dimensional problem.
draws <- as_draws_rvars(fit_logit4$draws())
summarize_draws(draws)
| variable | mean | median | sd | mad | q5 | q95 | rhat | ess_bulk | ess_tail |
|---|---|---|---|---|---|---|---|---|---|
| lp__ | -82.96 | -82.64 | 1.20 | 0.99 | -85.29 | -81.64 | 1 | 1446 | 1925 |
| alpha | 1.95 | 1.91 | 7.08 | 7.11 | -9.72 | 13.50 | 1 | 1090 | 1172 |
| beta[1] | 2.26 | 2.33 | 7.08 | 7.08 | -9.34 | 13.84 | 1 | 1089 | 1157 |
| beta[2] | -0.28 | -0.27 | 0.25 | 0.25 | -0.69 | 0.12 | 1 | 1561 | 1493 |
Plots
mcmc_pairs(as_draws_array(draws), pars=c("alpha","beta[1]","beta[2]"))
And there it is: alpha and beta[1] are super-correlated. We can compute the correlation.
cor(as_draws_matrix(draws)[,c("alpha","beta[1]")])[1,2]
[1] -0.9993102
The correlation close to 1 can happen also from other reasons (see the next example), but one possibility is that parameters have similar role in the model. Here the reason is the constant column in x, which we put there for the demonstration purposes. We may have constant column also if the predictor matrix is augmented unnecessarily with the intercept predictor, or if the observed data or subdata used in the specific analysis just happens to have only one unique value.
The data are Kilpisjärvi summer month temperatures 1952-2013.
data_kilpis <- read.delim(root("problems","kilpisjarvi-summer-temp.csv"), sep = ";")
data_lin <-list(M=1,
N = nrow(data_kilpis),
x = matrix(data_kilpis$year, ncol=1),
y = data_kilpis[,5])
Plot the data
ggplot() +
geom_point(aes(x, y), data = data.frame(data_lin), size = 1) +
labs(y = 'Summer temp. @Kilpisjärvi', x= "Year") +
guides(linetype = "none")
We use a linear model
code_lin <- root("problems", "linear_glm_kilpis.stan")
writeLines(readLines(code_lin))
// logistic regression
data {
int<lower=0> N;
int<lower=0> M;
vector[N] y;
matrix[N,M] x;
}
parameters {
real alpha;
vector[M] beta;
real sigma;
}
model {
alpha ~ normal(0, 100);
beta ~ normal(0, 100);
sigma ~ normal(0, 1);
y ~ normal_id_glm(x, alpha, beta, sigma);
}
Run Stan
mod_lin <- cmdstan_model(stan_file = code_lin)
fit_lin <- mod_lin$sample(data = data_lin, seed = SEED, refresh = 0)
Running MCMC with 4 sequential chains...
Chain 1 finished in 1.3 seconds.
Chain 2 finished in 1.2 seconds.
Chain 3 finished in 1.1 seconds.
Chain 4 finished in 1.2 seconds.
All 4 chains finished successfully.
Mean chain execution time: 1.2 seconds.
Total execution time: 5.2 seconds.
Stan gives a warning: There were X transitions after warmup that exceeded the maximum treedepth.
We can check other diagnostics as follows
fit_lin$cmdstan_diagnose()
Processing csv files: /tmp/RtmpoIWO9X/linear_glm_kilpis-202110211822-1-a28d21.csv, /tmp/RtmpoIWO9X/linear_glm_kilpis-202110211822-2-a28d21.csv, /tmp/RtmpoIWO9X/linear_glm_kilpis-202110211822-3-a28d21.csv, /tmp/RtmpoIWO9X/linear_glm_kilpis-202110211822-4-a28d21.csv
Checking sampler transitions treedepth.
16 of 4000 (0.4%) transitions hit the maximum treedepth limit of 10, or 2^10 leapfrog steps.
Trajectories that are prematurely terminated due to this limit will result in slow exploration.
For optimal performance, increase this limit.
Checking sampler transitions for divergences.
No divergent transitions found.
Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory for all transitions.
Effective sample size satisfactory.
Split R-hat values satisfactory all parameters.
Processing complete.
We check \(\widehat{R}\) end ESS values, which are fine.
draws <- as_draws_rvars(fit_lin$draws())
summarize_draws(draws)
| variable | mean | median | sd | mad | q5 | q95 | rhat | ess_bulk | ess_tail |
|---|---|---|---|---|---|---|---|---|---|
| lp__ | -38.50 | -38.18 | 1.25 | 1.02 | -40.89 | -37.15 | 1 | 1114 | 1649 |
| alpha | -30.86 | -30.79 | 15.06 | 14.45 | -55.47 | -5.57 | 1 | 1185 | 1337 |
| beta | 0.02 | 0.02 | 0.01 | 0.01 | 0.01 | 0.03 | 1 | 1184 | 1337 |
| sigma | 1.12 | 1.11 | 0.10 | 0.10 | 0.97 | 1.30 | 1 | 1350 | 1287 |
Plots
mcmc_pairs(as_draws_array(draws), pars=c("alpha","beta"))
And there it is: alpha and beta are super-correlated. Here the reason is that the x values are in the range 1952–2013, and the inrecept alpha denotes the temperature at year 0 which is very far away from the data. If the intercept alpha changes, the slope beta needs to change too. The high correlation makes the inference slower, and we can make it faster by centering x.
Here we simply subtract 1982.5 from the year, so that the mean of x is 0. We could also include the centering and back transformation to Stan code.
data_lin <-list(M=1,
N = nrow(data_kilpis),
x = matrix(data_kilpis$year-1982.5, ncol=1),
y = data_kilpis[,5])
fit_lin <- mod_lin$sample(data = data_lin, seed = SEED, refresh = 0)
Running MCMC with 4 sequential chains...
Chain 1 finished in 0.0 seconds.
Chain 2 finished in 0.0 seconds.
Chain 3 finished in 0.0 seconds.
Chain 4 finished in 0.0 seconds.
All 4 chains finished successfully.
Mean chain execution time: 0.0 seconds.
Total execution time: 0.5 seconds.
Now treedepth exceedence warnings.
We can check other diagnostics as follows
fit_lin$cmdstan_diagnose()
Processing csv files: /tmp/RtmpoIWO9X/linear_glm_kilpis-202110211822-1-809ee4.csv, /tmp/RtmpoIWO9X/linear_glm_kilpis-202110211822-2-809ee4.csv, /tmp/RtmpoIWO9X/linear_glm_kilpis-202110211822-3-809ee4.csv, /tmp/RtmpoIWO9X/linear_glm_kilpis-202110211822-4-809ee4.csv
Checking sampler transitions treedepth.
Treedepth satisfactory for all transitions.
Checking sampler transitions for divergences.
No divergent transitions found.
Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory for all transitions.
Effective sample size satisfactory.
Split R-hat values satisfactory all parameters.
Processing complete, no problems detected.
We check \(\widehat{R}\) end ESS values, which are now even better.
draws <- as_draws_rvars(fit_lin$draws())
summarize_draws(draws)
| variable | mean | median | sd | mad | q5 | q95 | rhat | ess_bulk | ess_tail |
|---|---|---|---|---|---|---|---|---|---|
| lp__ | -38.55 | -38.21 | 1.32 | 1.06 | -41.14 | -37.11 | 1 | 1915 | 2466 |
| alpha | 9.31 | 9.31 | 0.15 | 0.14 | 9.07 | 9.56 | 1 | 3681 | 2701 |
| beta | 0.02 | 0.02 | 0.01 | 0.01 | 0.01 | 0.03 | 1 | 3927 | 2766 |
| sigma | 1.12 | 1.11 | 0.11 | 0.10 | 0.96 | 1.31 | 1 | 3168 | 2454 |
Plots
mcmc_pairs(as_draws_array(draws), pars=c("alpha","beta"))
The posterior dependency has disappeared and the interpretation of alpha is the average temperature over all the observed years.
A toy example of bimodal distribution. Bimodal distributions can arise from many reasons as in mixture models or models with non-log-concave likelihoods or priors (ie with thick tails).
Bimodally distributed data
N=20
y=c(rnorm(N/2, mean=-5, sd=1),rnorm(N/2, mean=5, sd=1));
data_tt <-list(N = N, y = y)
Student’s t model
code_tt <- root("problems", "student.stan")
writeLines(readLines(code_tt))
// student-student model
data {
int<lower=0> N;
vector[N] y;
}
parameters {
real mu;
}
model {
mu ~ student_t(4, 0, 100);
y ~ student_t(4, mu, 1);
}
Sample
mod_tt <- cmdstan_model(stan_file = code_tt)
fit_tt <- mod_tt$sample(data = data_tt, seed = SEED, refresh = 0)
Running MCMC with 4 sequential chains...
Chain 1 finished in 0.0 seconds.
Chain 2 finished in 0.0 seconds.
Chain 3 finished in 0.0 seconds.
Chain 4 finished in 0.0 seconds.
All 4 chains finished successfully.
Mean chain execution time: 0.0 seconds.
Total execution time: 0.5 seconds.
There are convergence issues reported by sampling. We can also explicitly call CmdStan inference diagnostics:
fit_tt$cmdstan_diagnose()
Processing csv files: /tmp/RtmpoIWO9X/student-202110211823-1-60702b.csv, /tmp/RtmpoIWO9X/student-202110211823-2-60702b.csv, /tmp/RtmpoIWO9X/student-202110211823-3-60702b.csv, /tmp/RtmpoIWO9X/student-202110211823-4-60702b.csv
Checking sampler transitions treedepth.
Treedepth satisfactory for all transitions.
Checking sampler transitions for divergences.
No divergent transitions found.
Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory for all transitions.
The following parameters had fewer than 0.001 effective draws per transition:
mu
Such low values indicate that the effective sample size estimators may be biased high and actual performance may be substantially lower than quoted.
The following parameters had split R-hat greater than 1.05:
mu
Such high values indicate incomplete mixing and biasedestimation.
You should consider regularizating your model with additional prior information or a more effective parameterization.
Processing complete.
High Rhat and very low ESS We check \(\widehat{R}\) end ESS values.
draws <- as_draws_rvars(fit_tt$draws())
summarize_draws(draws)
| variable | mean | median | sd | mad | q5 | q95 | rhat | ess_bulk | ess_tail |
|---|---|---|---|---|---|---|---|---|---|
| lp__ | -85.21 | -84.92 | 0.72 | 0.30 | -86.76 | -84.63 | 1.04 | 88 | NA |
| mu | -2.42 | -4.51 | 3.91 | 0.62 | -5.23 | 4.63 | 1.53 | 7 | NA |
Histogram shows two modes
mcmc_hist(as_draws_array(draws), pars=c("mu"))
Trace plot shows that the chains are not mixing between the modes
mcmc_trace(as_draws_array(draws), pars=c("mu"))
The same example, but with this data, the modes are close enough that it’s easy for MCMC to jump from one mode to another.
N=20
y=c(rnorm(N/2, mean=-3, sd=1),rnorm(N/2, mean=3, sd=1));
data_tt <-list(N = N, y = y)
fit_tt <- mod_tt$sample(data = data_tt, seed = SEED, refresh = 0)
Running MCMC with 4 sequential chains...
Chain 1 finished in 0.0 seconds.
Chain 2 finished in 0.0 seconds.
Chain 3 finished in 0.0 seconds.
Chain 4 finished in 0.0 seconds.
All 4 chains finished successfully.
Mean chain execution time: 0.0 seconds.
Total execution time: 0.5 seconds.
There were no convergence issues reported by sampling. We can also explicitly call CmdStan inference diagnostics:
fit_tt$cmdstan_diagnose()
Processing csv files: /tmp/RtmpoIWO9X/student-202110211823-1-41fb2f.csv, /tmp/RtmpoIWO9X/student-202110211823-2-41fb2f.csv, /tmp/RtmpoIWO9X/student-202110211823-3-41fb2f.csv, /tmp/RtmpoIWO9X/student-202110211823-4-41fb2f.csv
Checking sampler transitions treedepth.
Treedepth satisfactory for all transitions.
Checking sampler transitions for divergences.
No divergent transitions found.
Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory for all transitions.
Effective sample size satisfactory.
Split R-hat values satisfactory all parameters.
Processing complete, no problems detected.
We check \(\widehat{R}\) end ESS values, which in this case all look good.
draws <- as_draws_rvars(fit_tt$draws())
summarize_draws(draws)
| variable | mean | median | sd | mad | q5 | q95 | rhat | ess_bulk | ess_tail |
|---|---|---|---|---|---|---|---|---|---|
| lp__ | -53.69 | -53.65 | 0.44 | 0.20 | -54.45 | -53.27 | 1 | 1161 | 2051 |
| mu | -0.42 | -0.56 | 1.14 | 1.39 | -2.04 | 1.48 | 1 | 646 | 1566 |
Two modes are visible
mcmc_hist(as_draws_array(draws), pars=c("mu"))
Trace plot is not very useful. It shows the chains are jumping between modes, but it’s difficult to see whether the jumps happen often enough and chains are mixing well.
mcmc_trace(as_draws_array(draws), pars=c("mu"))
Rank histogram plot
mcmc_rank_hist(as_draws_array(draws), pars=c("mu"))
Rank ECDF plot
Add here
MCMC requires some initial values. By default Stan generates them randomly from [-2,2] (in unconstrained space). Sometimes these initial values can be dbad and cause numerical issues. Computers, in general, use finite number of bits to present numbers and with very small or large numbers, there can be problems of presenting them or there can be significant loss of accuracy.
The data is generated from a Poisson regression model. The Poisson intensity parameter has to be positive and usually the latent linear predictor is exponentiated to be positive (the exponentiation can also be justified by multiplicative effects on Poisson intensity).
set.seed(SEED)
M=1;
N=20;
x=1e3*matrix(c(sort(rnorm(N))),ncol=M)
y=rpois(N,exp(1e-3*x[,1]))
data_pois <-list(M = M, N = N, x = x, y = y)
ggplot() +
geom_point(aes(x, y), data = data.frame(data_pois), size = 3)
Poisson regression model with proper priors
code_pois <- root("problems", "pois_glm.stan")
writeLines(readLines(code_pois))
// Poisson regression
data {
int<lower=0> N;
int<lower=0> M;
int<lower=0> y[N];
matrix[N,M] x;
}
parameters {
real alpha;
vector[M] beta;
}
model {
alpha ~ normal(0,10);
beta ~ normal(0,10);
y ~ poisson_log_glm(x, alpha, beta);
}
Sample
mod_pois <- cmdstan_model(stan_file = code_pois)
fit_pois <- mod_pois$sample(data = data_pois, seed = SEED, refresh = 0)
Running MCMC with 4 sequential chains...
Chain 1 finished in 1.1 seconds.
Chain 2 finished in 0.2 seconds.
Chain 3 finished in 0.0 seconds.
Chain 4 finished in 0.0 seconds.
All 4 chains finished successfully.
Mean chain execution time: 0.3 seconds.
Total execution time: 1.7 seconds.
We get a lot of warnings. Uh, they show in console, but not in the notebook!
Chain 4 Rejecting initial value:
Chain 4 Log probability evaluates to log(0), i.e. negative infinity.
Chain 4 Stan can't start sampling from this initial value.
There are convergence issues reported by sampling. We can also explicitly call CmdStan inference diagnostics:
fit_pois$cmdstan_diagnose()
Processing csv files: /tmp/RtmpoIWO9X/pois_glm-202110211823-1-230c83.csv, /tmp/RtmpoIWO9X/pois_glm-202110211823-2-230c83.csv, /tmp/RtmpoIWO9X/pois_glm-202110211823-3-230c83.csv, /tmp/RtmpoIWO9X/pois_glm-202110211823-4-230c83.csv
Checking sampler transitions treedepth.
592 of 4000 (15%) transitions hit the maximum treedepth limit of 10, or 2^10 leapfrog steps.
Trajectories that are prematurely terminated due to this limit will result in slow exploration.
For optimal performance, increase this limit.
Checking sampler transitions for divergences.
No divergent transitions found.
Checking E-BFMI - sampler transitions HMC potential energy.
The E-BFMI, 0.0057, is below the nominal threshold of 0.3 which suggests that HMC may have trouble exploring the target distribution.
If possible, try to reparameterize the model.
The following parameters had fewer than 0.001 effective draws per transition:
alpha, beta[1]
Such low values indicate that the effective sample size estimators may be biased high and actual performance may be substantially lower than quoted.
The following parameters had split R-hat greater than 1.1:
alpha, beta[1]
Such high values indicate incomplete mixing and biasedestimation.
You should consider regularizating your model with additional prior information or a more effective parameterization.
Processing complete.
We check \(\widehat{R}\) end ESS values
draws <- as_draws_rvars(fit_pois$draws())
summarize_draws(draws)
| variable | mean | median | sd | mad | q5 | q95 | rhat | ess_bulk | ess_tail |
|---|---|---|---|---|---|---|---|---|---|
| lp__ | -9.023224e+80 | -1.859e+35 | 1.986335e+81 | 2.756153e+35 | -5.320882e+81 | 9.99 | NA | 5 | NA |
| alpha | -1.400000e-01 | -9.000e-02 | 6.500000e-01 | 1.020000e+00 | -1.080000e+00 | 0.70 | NA | NA | NA |
| beta | -2.000000e-02 | 0.000e+00 | 5.000000e-02 | 3.000000e-02 | -1.100000e-01 | 0.04 | 3.23 | 4 | NA |
Plot
mcmc_pairs(as_draws_array(draws), pars=c("alpha","beta"))
The reason for the issue is that the initial values for beta is sampled from [-2, 2] and x has some large values. If the initial value for beta is higher than about 0.3 or lower than -0.4, some of the values of exp(alpha + beta * x) will overflow to Inf.
In this case the problem is alleviated by scaling the x
data_pois <-list(M = M, N = N, x = x/1e3, y = y)
ggplot() +
geom_point(aes(x, y), data = data.frame(data_pois), size = 3)
Poisson regression model with proper priors
code_pois <- root("problems", "pois_glm.stan")
writeLines(readLines(code_pois))
// Poisson regression
data {
int<lower=0> N;
int<lower=0> M;
int<lower=0> y[N];
matrix[N,M] x;
}
parameters {
real alpha;
vector[M] beta;
}
model {
alpha ~ normal(0,10);
beta ~ normal(0,10);
y ~ poisson_log_glm(x, alpha, beta);
}
Sample
mod_pois <- cmdstan_model(stan_file = code_pois)
fit_pois <- mod_pois$sample(data = data_pois, seed = SEED, refresh = 0)
Running MCMC with 4 sequential chains...
Chain 1 finished in 0.0 seconds.
Chain 2 finished in 0.0 seconds.
Chain 3 finished in 0.0 seconds.
Chain 4 finished in 0.0 seconds.
All 4 chains finished successfully.
Mean chain execution time: 0.0 seconds.
Total execution time: 0.5 seconds.
We get a lot of warnings
Chain 4 Rejecting initial value: Chain 4 Log probability evaluates to log(0), i.e. negative infinity. Chain 4 Stan can’t start sampling from this initial value.
There are convergence issues reported by sampling. We can also explicitly call CmdStan inference diagnostics:
fit_pois$cmdstan_diagnose()
Processing csv files: /tmp/RtmpoIWO9X/pois_glm-202110211823-1-1b0c9a.csv, /tmp/RtmpoIWO9X/pois_glm-202110211823-2-1b0c9a.csv, /tmp/RtmpoIWO9X/pois_glm-202110211823-3-1b0c9a.csv, /tmp/RtmpoIWO9X/pois_glm-202110211823-4-1b0c9a.csv
Checking sampler transitions treedepth.
Treedepth satisfactory for all transitions.
Checking sampler transitions for divergences.
No divergent transitions found.
Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory for all transitions.
Effective sample size satisfactory.
Split R-hat values satisfactory all parameters.
Processing complete, no problems detected.
We check \(\widehat{R}\) end ESS values
draws <- as_draws_rvars(fit_pois$draws())
summarize_draws(draws)
| variable | mean | median | sd | mad | q5 | q95 | rhat | ess_bulk | ess_tail |
|---|---|---|---|---|---|---|---|---|---|
| lp__ | 9.09 | 9.38 | 0.99 | 0.71 | 7.16 | 10.03 | 1 | 1434 | 1804 |
| alpha | -0.05 | -0.04 | 0.26 | 0.26 | -0.50 | 0.37 | 1 | 1091 | 1154 |
| beta | 1.01 | 1.01 | 0.18 | 0.18 | 0.73 | 1.31 | 1 | 1151 | 1376 |
Plot
mcmc_pairs(as_draws_array(draws), pars=c("alpha","beta"))
Everything works fine.
It can be sometimes difficult to find a good initial values.
If the initial value warning comes only once, it is possible that MCMC was able to escape the bad region and rest of the inference is ok.
We expect Pathfinder to help with initial values.
A simple Bernoulli regression with proper but thick tailed prior (Cauchy)
code_logit4 <- root("problems", "logit_glm4.stan")
writeLines(readLines(code_logit4))
// logistic regression
data {
int<lower=0> N;
int<lower=0> M;
int<lower=0,upper=1> y[N];
matrix[N,M] x;
}
parameters {
real alpha;
vector[M] beta;
}
model {
alpha ~ cauchy(0, 10);
beta ~ cauchy(0, 10);
y ~ bernoulli_logit_glm(x, alpha, beta);
}
Sample
mod_logit4 <- cmdstan_model(stan_file = code_logit4)
fit_logit4 <- mod_logit4$sample(data = data_logit, seed = SEED, refresh = 0)
Running MCMC with 4 sequential chains...
Chain 1 finished in 0.0 seconds.
Chain 2 finished in 0.1 seconds.
Chain 3 finished in 0.1 seconds.
Chain 4 finished in 0.0 seconds.
All 4 chains finished successfully.
Mean chain execution time: 0.1 seconds.
Total execution time: 0.5 seconds.
There were no convergence issues reported by sampling. We can also explicitly call CmdStan inference diagnostics:
fit_logit4$cmdstan_diagnose()
Processing csv files: /tmp/RtmpoIWO9X/logit_glm4-202110211823-1-159f48.csv, /tmp/RtmpoIWO9X/logit_glm4-202110211823-2-159f48.csv, /tmp/RtmpoIWO9X/logit_glm4-202110211823-3-159f48.csv, /tmp/RtmpoIWO9X/logit_glm4-202110211823-4-159f48.csv
Checking sampler transitions treedepth.
Treedepth satisfactory for all transitions.
Checking sampler transitions for divergences.
No divergent transitions found.
Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory for all transitions.
Effective sample size satisfactory.
Split R-hat values satisfactory all parameters.
Processing complete, no problems detected.
We check \(\widehat{R}\) end ESS values, which in this case all look good.
draws <- as_draws_rvars(fit_logit4$draws())
summarize_draws(draws)
| variable | mean | median | sd | mad | q5 | q95 | rhat | ess_bulk | ess_tail |
|---|---|---|---|---|---|---|---|---|---|
| lp__ | -3.86 | -3.28 | 1.98 | 1.43 | -7.83 | -1.92 | 1.01 | 225 | 138 |
| alpha | 15.59 | 10.23 | 22.65 | 6.82 | 2.70 | 43.35 | 1.02 | 187 | 145 |
| beta | 37.28 | 24.63 | 50.84 | 16.58 | 7.83 | 104.13 | 1.01 | 189 | NA |
ESSs are not that big, but not really alarming Plot
mcmc_pairs(as_draws_array(draws), pars=c("alpha","beta"))
Tail is long Rank-plots
mcmc_rank_hist(as_draws_array(draws), pars=c("alpha"))
Histograms are not really uniform.
Demonstration what happens if we forget <lower=0> from a parameter that has to be positive.
We simulated x and y independently from normal distributions. As N=8 is small, there will be lot of uncertainty about the parameters including the scale sigma.
M=1;
N=8;
set.seed(SEED)
x=matrix(rnorm(N),ncol=M)
y=rnorm(N)/10
data_lin <-list(M = M, N = N, x = x, y = y)
We use linear regression model with proper priors.
code_lin <- root("problems", "linear_glm.stan")
writeLines(readLines(code_lin))
// logistic regression
data {
int<lower=0> N;
int<lower=0> M;
vector[N] y;
matrix[N,M] x;
}
parameters {
real alpha;
vector[M] beta;
real sigma;
}
model {
alpha ~ normal(0, 1);
beta ~ normal(0, 1);
sigma ~ normal(0, 1);
y ~ normal_id_glm(x, alpha, beta, sigma);
}
Sample
mod_lin <- cmdstan_model(stan_file = code_lin)
fit_lin <- mod_lin$sample(data = data_lin, seed = SEED, refresh = 0)
Running MCMC with 4 sequential chains...
Chain 1 finished in 0.0 seconds.
Chain 2 finished in 0.0 seconds.
Chain 3 finished in 0.0 seconds.
Chain 4 finished in 0.0 seconds.
All 4 chains finished successfully.
Mean chain execution time: 0.0 seconds.
Total execution time: 0.5 seconds.
We get many times the following warnings
Chain 4 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Chain 4 Exception: normal_id_glm_lpdf: Scale vector is -0.747476, but must be positive finite! (in '/tmp/RtmprEP4gg/model-7caa12ce8e405.stan', line 16, column 2 to column 43)
Chain 4 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
Chain 4 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Sometimes these happen in early phase, even if the model has been correctly defined, but now we have too many of them, which indicates the samples is trying to jump to infeasible values, which here means the negative scale parameter values. Many rejections may lead to biased estimates.
There are some divergences reported, which is not necessarily alarming, but in this case are likely to be related to sub-optimal stepsize adaptation due to the many rejections. We can also explicitly call CmdStan inference diagnostics:
fit_lin$cmdstan_diagnose()
Processing csv files: /tmp/RtmpoIWO9X/linear_glm-202110211823-1-7310bb.csv, /tmp/RtmpoIWO9X/linear_glm-202110211823-2-7310bb.csv, /tmp/RtmpoIWO9X/linear_glm-202110211823-3-7310bb.csv, /tmp/RtmpoIWO9X/linear_glm-202110211823-4-7310bb.csv
Checking sampler transitions treedepth.
Treedepth satisfactory for all transitions.
Checking sampler transitions for divergences.
22 of 4000 (0.55%) transitions ended with a divergence.
These divergent transitions indicate that HMC is not fully able to explore the posterior distribution.
Try increasing adapt delta closer to 1.
If this doesn't remove all divergences, try to reparameterize the model.
Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory for all transitions.
Effective sample size satisfactory.
Split R-hat values satisfactory all parameters.
Processing complete.
We check \(\widehat{R}\) end ESS values, which in this case are ok.
draws <- as_draws_rvars(fit_lin$draws())
summarize_draws(draws)
| variable | mean | median | sd | mad | q5 | q95 | rhat | ess_bulk | ess_tail |
|---|---|---|---|---|---|---|---|---|---|
| lp__ | 18.09 | 18.63 | 1.96 | 1.53 | 13.94 | 20.17 | 1 | 765 | 770 |
| alpha | 0.05 | 0.05 | 0.03 | 0.02 | 0.01 | 0.10 | 1 | 1602 | 1421 |
| beta | 0.04 | 0.04 | 0.02 | 0.02 | 0.00 | 0.07 | 1 | 1455 | 1164 |
| sigma | 0.07 | 0.06 | 0.03 | 0.02 | 0.04 | 0.13 | 1 | 1100 | 801 |
Plots
mcmc_hist(as_draws_array(draws), pars=c("sigma"))+xlim(c(0,0.31))
Fixed model inlcudes <lower=0> constraint for sigma.
code_lin2 <- root("problems", "linear_glm2.stan")
writeLines(readLines(code_lin2))
// logistic regression
data {
int<lower=0> N;
int<lower=0> M;
vector[N] y;
matrix[N,M] x;
}
parameters {
real alpha;
vector[M] beta;
real<lower=0> sigma;
}
model {
alpha ~ normal(0, 1);
beta ~ normal(0, 1);
sigma ~ normal(0, 1);
y ~ normal_id_glm(x, alpha, beta, sigma);
}
Sample
mod_lin2 <- cmdstan_model(stan_file = code_lin2)
fit_lin2 <- mod_lin2$sample(data = data_lin, seed = SEED, refresh = 0)
Running MCMC with 4 sequential chains...
Chain 1 finished in 0.0 seconds.
Chain 2 finished in 0.0 seconds.
Chain 3 finished in 0.0 seconds.
Chain 4 finished in 0.0 seconds.
All 4 chains finished successfully.
Mean chain execution time: 0.0 seconds.
Total execution time: 0.5 seconds.
No sampling warnings We check \(\widehat{R}\) end ESS values, whic are ok.
draws2 <- as_draws_rvars(fit_lin2$draws())
summarize_draws(draws2)
| variable | mean | median | sd | mad | q5 | q95 | rhat | ess_bulk | ess_tail |
|---|---|---|---|---|---|---|---|---|---|
| lp__ | 15.51 | 15.93 | 1.59 | 1.30 | 12.42 | 17.24 | 1 | 1130 | 1507 |
| alpha | 0.05 | 0.05 | 0.03 | 0.02 | 0.00 | 0.09 | 1 | 1891 | 1749 |
| beta | 0.04 | 0.04 | 0.02 | 0.02 | 0.00 | 0.07 | 1 | 1964 | 1752 |
| sigma | 0.07 | 0.06 | 0.03 | 0.02 | 0.04 | 0.12 | 1 | 1415 | 1822 |
Plots
mcmc_hist(as_draws_array(draws), pars=c("sigma"))+xlim(c(0,0.31))
In this specific case the bias is negliglible, but the sampling with the proper constraint is more efficient.